Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evals
Specific
model evaluation, benchmarks, evals
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
6554
posts in
12.5
ms
Benchmarking
LLM Tool-Use in the Wild
🦙
Local LLM
arxiv.org
·
6d
·
Hacker News
How to
diagnose
RAG failures from
traces
🐛
Fuzzing
siquick.com
·
1d
·
Hacker News
Model API Performance
🕸️
WASM
news.ycombinator.com
·
18h
·
Hacker News
Measuring
Human Performance on
ARC-AGI-3
📊
Performance Monitoring
arcprize.org
·
5h
·
Hacker News
Building a Robust Documentation Agent with
DigitalOcean
Gradient
AI Platform
🤖
Creative Automation
digitalocean.com
·
1d
·
Hacker News
I "
Rewrote
" My
ORM
Again with AI. And Ended Up Benchmarking Every PHP
ORM
in the Process.
🗄️
Database Internals
technex.us
·
14h
·
Hacker News
Benchmarking
LLMs with
Marimo
Pair
🦙
Local LLM
ericmjl.github.io
·
5d
·
Hacker News
hallengray/rag-forge
: Production-grade RAG pipelines with evaluation
baked
in
🔧
Code Generation
github.com
·
11h
·
Hacker News
AI
Frontier
Model
Tracker
with API
🧠
AI
demandsphere.com
·
1d
·
Hacker News
Open Source LLM
Comparison
🦙
Local LLM
paradise-runner.github.io
·
4d
·
Hacker News
Burmese-Coder-4B
: A
Burmese
Coding LLM for Low-Resource Language AI
💬
Prompt Engineering
hackernoon.com
·
1d
BuildersArk/qobserva
: Observability and benchmarking for quantum programs. Local-first, multi-SDK
📊
Performance Monitoring
github.com
·
13h
·
Hacker News
Quantization
,
LoRA
, and the 8% Problem: Benchmarking Local LLMs for Production AI
💬
Prompt Engineering
walsenburgtech.com
·
3d
·
Hacker News
Show HN:
Proposal
for a real long-term AI memory
benchmark
💬
Prompt Engineering
penfieldlabs.substack.com
·
5d
·
Substack
Introducing
KellyBench
💬
Prompt Engineering
gr.inc
·
3d
·
Hacker News
Déjà
Code: How LLMs Quietly Cheat on
Repos
They've Already Seen
🐛
Fuzzing
blogs.latentforce.ai
·
4d
·
Hacker News
Bonsai
8B
: A 1-Bit LLM That Delivers
8B-Class
Performance at 1/14th the Size
🗄️
Database Internals
firethering.com
·
6d
·
Hacker News
Center for Responsible,
Decentralized
Intelligence at
Berkeley
🐛
Fuzzing
rdi.berkeley.edu
·
3d
·
Hacker News
How to
Evaluate
an AI Persona: Beyond Benchmarks and
Vibes
🧠
AI
hackernoon.com
·
2d
AXI
: Agent EXperience
Interface
🔌
LSP Protocol
axi.md
·
5d
·
Hacker News
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help